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Abstract 

We present a first-order theory of sequences with integer elements, 
Prosburgcr arithmetic, and regular constraints, which can model signif- 
icant properties of data structures such as arrays and lists. We give a 
decision procedure for the quantifier- free fragment, based on an encoding 
into the first-order theory of concatenation; the procedure has PSPACE 
complexity. The quantifier-free fragment of the theory of sequences can 
express properties such as sortedncss and injectivity, as well as Boolean 
combinations of periodic and arithmetic facts relating the elements of 
the sequence and their positions (e.g., "for all even i's, the element at 
position i has value i -|- 3 or 2i"). The resulting expressive power is or- 
thogonal to that of the most expressive decidable logics for arrays. Some 
examples demonstrate that the fragment is also suitable to reason about 
sequence-manipulating programs within the standard framework of a^c- 
iomatic semantics. 
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1 Introduction 



Verification is undecidable already for simple programs, but modern program- 
ming languages support a variety of sophisticated features that make it all the 
more complicated. These advanced features — such as arrays, pointers, dynamic 
allocation of resources, and object-oriented abstract data types — are needed 
because they raise the level of abstraction thus making programmers more pro- 
ductive and programs less buggy. Verification techniques have also progressed 
rapidly over the years, in an attempt to keep the pace with the development of 
programming languages. 

Automated verification requires expressive program logics and powerful de- 
cision procedures. In response to the evolution of modern programming lan- 
guages, new decidable program logic fragments and combination techniques for 
different fragments have mushroomed especially in recent years. Many of the 
most successful contributions have focused on verifying relatively restricted as- 
pects of a program's behavior, for example by decoupling pointer structure and 
functional properties in the formal analysis of a dynamic data structure. This 
narrowing choice, partly deliberate and partly required by the formidable dif- 
ficulty of the various problems, is effective because different aspects are often 
sufficiently decoupled so that each of them can be analyzed in isolation with the 
most appropriate, specific technique. 

This paper contributes to the growing repertory of special program logics by 
exploring the decidability of properties of sequences of elements of homogeneous 
type. These can abstract fundamental features of several data structures: arrays 
imprimis, but also the sequence of values stored in a dynamically allocated list, 
or the content of a stack or a queue. 

We take a new angle on reasoning about sequences, based on the theory of 
concatenation: a first-order theory where variables are interpreted as words (or 
sequences) over a finite alphabet and can be composed by concatenating them. 
Makanin's algorithm for solving word equations [52] implies the decidability of 
the quantifier-free fragment of the theory of concatenation. Based on this, we 
introduce a first-order theory of sequences 7^eq(z) whose elements are integers. 
Section |3.2| presents a decision procedure for the quantifier-free fragment of 
%eq{i,)j which encodes the validity problem into the quantifier-free theory of 
concatenation. The decision procedure is in PSPACE; it is known, however, 
that Makanin's algorithm is reasonably efficient in practice [T]. 

The theory of sequences 7^eq(z) allows concatenating sequences to build new 
ones, and it includes Presburger arithmetic over elements of a sequence. On the 
other hand, it forbids explicit indexed access to elements, which differentiates it 
from the theory of arrays and extensions thereof (see Section [5|. The resulting 
quantifier- free fragment has significant expressiveness, in spite of its limitations 
in representing subsequences of variable length. In particular, we show some 
interesting properties that are inexpressible in powerful decidable array logics 
(such as those in [SI Uni HH [H] ) but are expressible in our theory of sequences. 
Conversely, there exist decidable properties of extensions of the theory of arrays 
that are inexpressible in 7^eq(z)- These results support our claim that the theory 
of sequences provides a fresh angle on reasoning about sequences, orthogonal to 
most approaches that model sequences as arrays. 

In order to better assess the limits of our theory of sequences, we also prove 
that several natural extensions of the quantifier-free fragment of 7^eq(z) are 
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undecidable. Finally, we demonstrate reasoning about sequence-manipulating 
programs with annotations written in the quantifier-free fragment of 7^eq(z)- A 
couple of examples in Section |4] illustrate the usage of 7^eq(z) formulas with the 
standard machinery of axiomatic semantics and backward reasoning. 



Paper outline. Section [2] presents the theory of concatenation and summa- 
rizes a few decidability and undecidability results about it. Section [3] introduces 
our theory of integer sequences 7^eq(z)? demonstrates its expressiveness, provides 
a decision procedure for its quantifier- free fragment, and shows undecidable ex- 
tensions of the theory. Section |4] illustrates how to use the theory 7^eq(Z) to 
reason about programs in the standard axiomatic semantics framework. Finally, 
Section [5] reviews related work and Section|6]concludes by outlining future work. 



2 The Theory of Concatenation 

This section introduces some basic notation (Section|2.1[) and summarizes some 



results about the first-order theory of concatenation (Section 2.2 ) that we will 
use in the remainder of the paper. 

In the rest of the paper, we assume familiarity with the standard syntax 
and terminology of first-order theories (e.g., [7]); in particular, we assume the 
standard abbreviations and symbols of first-order theories with the following 
operator precedence ^, A, V, <S4>, V and 3. 

FV{(t)) denotes the set of free variables of a formula 0. With standard 
terminology, a formula is a sentence iff it is closed iff FV{(j)) = 0. Given a 
regular expression Q over {3, V}, the Q-fragment of a first-order theory is the set 
of all formulas of the theory in the form Q • "0, where ip is quantifier-free. The 
universal and existential fragments are synonyms for the V*- and 3*-fragment 
respectively. A fragment is decidable iff the validity problem is decidable for 
its sentences. It is customary to define the validity and satisfiability problems 
for a quantifier-free formula ip as follows: ip is valid iff the universal closure 
of t/j is valid, and is satisfiable iff the existential closure of is valid. As 
a consequence of this definition, the decidability of a quantifier-free fragment 
whose formulas are closed under negation is tantamount to the decidability of 
the universal or existential fragments. Correspondingly, in the paper we will 
allow some freedom in picking the terminology that is most appropriate to the 
context. 



2.1 Sequences and Concatenation 

Z denotes the set of integer numbers and K denotes the set of nonnegative 
integers. 

Given a set A = {a, 6, c, ...} of constants, a sequence over A is any word 
V = v{l)v{2) ■ ■ ■ v{n) for some n € IN where v{i) € A for all 1 < i < n. The 
symbol e denotes the empty sequence, for which n = 0. \v\ — n denotes the 
length oi v. A* denotes the set of all finite sequences over A including e ^ A. 
It is also convenient to introduce the shorthand v{ki,k2) with ki,k2 & Z to 
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describe subsequences of a given sequence v; it is defined as follows. 



v{ki,k2) = 



v{ki)v{ki + 1) ■ ■ ■ V{k2) 
v{ki, \v\ + k2) 
v{\v\+ki,\v\+k2) 



1 < fcl < ^2 < \v\ 

ki-\v\<k2 <l < ki 
1 - < /ci < fc2 < 1 



otherwise 



For two sequences vi,V2 € A*, vi*V2 denotes their concatenation: the sequence 
vi{l) ■ ■ ■ Vi{\vi\)v2{i) ■ ■ ■ V2{\v2\)- We will drop the concatenation symbol when- 
ever unambiguous. 

The structure {A* , -k, e) is also referred to as the free monoid with generators 
in A and neutral element e. The size |^| is called rank of the free monoid and 
it can be finite or infinite. 

2.2 Decidability in the Theory of Concatenation 
2.2.1 Syntax and Semantics 

The theory of concatenation is the first-order theory Teat, with signature 



= {Ri, R2, . . .} is a set of unary (monadic) predicate symbols called regularity 
constraints. We sometimes write Ri{x) as x d Ri and a /3 abbreviates 



An interpretation of a formula in the theory of concatenation is a structure 
{A* £,7?., ev) where {A* e) is a free monoid, TZ — {R]^,R2, . . .} is a collection 
of regular subsets of A*, and ev is a mapping from variables to values in A*. 
The satisfaction relation {A* ,-k, e,TZ, ev) \= cj) for formulas in T^at is defined in a 
standard fashion with the following assumptions. 

• any variable x takes the value ev{x) G A*] 

• the concatenation x oy oi two variables x, y takes the value ev{x) * ev{y); 

• for each R^ G TZ, the corresponding Rj G ^ defines the set of sequences 
a; G Rj for which R^ (x) holds (this also subsumes the usage of constants) . 

2.2.2 (Un)Decidable Fragments 

The following propositions summarize some decidability results about fragments 
of the theory of concatenation; they all are known results, or corollaries of them. 
The standard presentation of these results focuses on solving equations over 
sequences with free variables and, correspondingly, on existential fragments of 
the equational theory. On the contrary, in this paper we will mostly focus on 
the universal fragment, given its aptness for annotating sequence-manipulating 
programs (see Section |4]). It is straightforward, however, to rephrase the results 
in terms of the dual existential fragments, given the availability of negation in 
the language. 

^We use the symbol = to distinguish it from the standard arithmetic equaUty symbol = 
used later in the paper. 




-(« = /?). 
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Proposition 1 (Decidability [5^ [T51 The universal and existential frag- 

ments of the theory of concatenation over free monoids with finite rank are 
decidahle m P SPACE. 

Proof. Decidability is a consequence of Makanin's seminal result on word equa- 
tions |32j and its extensions to the full existential (and universal) fragments 
[inillS]- PSPACE complexity is a consequence of Plandowski's recent results 
[551 155] and the fact that transforming first-order formulas into a single word 
equation introduces only a polynomial blow-up. 

The only catch is that the standard presentation assumes formulas in the 
canonical form Va;i € Ri, 2:2 G R2, • • • , 2:^ S • p where regularity constraints 
do not appear in p. This is, however, without loss of generality as we can put 
any universal formula Vic • in canonical form: first rewrite ip into 

f\ {xi = h+ y X, ^ hj) ^ i) 

l<i<\x\ 

i<i<|K| 

for fresh h'^ G R^ and h'J ^ A* \ R^. Then, put in negated normal form 
and eliminate occurrences of regularity predicates by applying exhaustively the 
rules: 

)] i'[-^'^m{Xn)] 

It is not difficult to see that this transformation preserves satisfiability and 
introduces a blow-up which is quadratic at most. □ 

Proposition 2 (Undecidability). • ml The V*i3* and 3*V* fragments of 
the theory of concatenation are undecidable; in particular the -fragment 
is undecidable already for negation-free formulas. 

• Ulf The existential and universal fragments of the extension of the the- 
ory of concatenation over the free monoid {a, 6}* with: (1) two length 
functions \x\a = {y & cl* \ y has the same number ofa's as x} and \x\b = 
{y £ b* \ y has the same number of b's as x}; or (2) the function Sp{x) = 
|a^|a*|2:|;) are undecidable. 

A set of sequences 5 C ^* is universally (resp. existentially) definable 
from concatenation iff there is a universal (resp. existential) formula ^p[x\ with 
FV{(p[x\) = {x) such that S = {y A* \ Lp[y]). 

Proposition 3 (Definability pTj). •The set S= ^ {a" 6" | n e IN} is nei- 
ther universally nor existentially definable from concatenation. 

• The equal length predicate Elg{x,y) = \x\ = \y\ is not definable in the 
existential and universal fragments of concatenation. 

Proof. Biichi and Senger prove in [11] Corollary 3] that is not existentially 
definable. A very similar argument shows that S'^ = {a™6" \ < m < n} 
is also not existentially definable (using the terminology of [TTl Corollary 3], 
the spacers of a*~^6* relative to the atom a are {a,ab'^) and there are only 
i — 1 words in with these spacers). The existential non-definability of 
entails the existential non- definability of the set = {a"&'" \ < n ^ m} 
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by contradiction as follows. Assume that were existentially definable; then 
X & S'^ could be defined as x € ^3u,v,p{u e a* Av G b* Ap e Aupv ^ S^) 
(that is, |m| + |p| — |w|), a contradiction. Finally, 5" is universally definable from 
concatenation iff S"^ is existentially definable from concatenation. In fact, the 
complement set {a, b}\S= is S~\JS^ with 5"" = {a, h}*b{a, b}*a{a, b}* clearly 
existentially and universally definable from concatenation. This concludes the 
proof of the first part of the proposition. 

The second part is proved in [111 Theorem 1] for the existential fragment 
and it is straightforward to adapt that proof to universal definability. □ 

It is currently unknown whether the extension of the existential or universal 
fragment of concatenation with Elg is decidable, while allowing membership 
constraints over deterministic context-free language gives an undecidable theory 



3 A Theory of Sequences 



This section introduces a first-order theory of sequences (Section 3.1 ) with arith- 
metic, gives a decision procedure for its universal fragment (Section 3.2), and 
shows that "natural" larger fragments are undecidable (Section 3.3). 



3.1 A Theory of Integer Sequences 

We present an arithmetic theory of sequences whose elements are integers. It 
would be possible to make the theory parametric with respect to the element 
type. Focusing on integers, however, makes the presentation clearer and more 
concrete, with minimal loss of generality as one can introduce any theory defin- 
able in the integer arithmetic fragment. 



3.1.1 Syntax and Semantics 

Syntax. Properties of integers are expressed in Presburger arithmetic whose 
signature is: 

= {0,1, +,-,=,<} 
Then, our theory 7^eq(z) of sequences with integer values has signature 

Operator precedence is: o; -|- and — ; =, = and < followed by logic connectives 
and quantifiers with the previously defined precedence. 

We will generally consider formulas in prenex normal form 

Q • ^ 

where Q is a quantifier prefix and ip is quantifier-free written in the grammar: 

seq ::= var \ int \ seq o seq 
int ::— | 1 | seq \ int + int \ int — int 
fmla ::= seq = seq \ R{seq) \ int = int \ int < int 

I ^fnila I fmla V fmla \ fmla A fmla \ fmla fmla 

with var ranging over variable names. 
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Semantics. An interpretation of a sentence of 7^eq(z) is a structure (Z* , e, TZ, 
ev) with tlic following assumptionsj^ 

• (Z*. e, eti) have the same meaning as in the theory of concatenation. 

• As far as arithmetic is concerned: 

— The interpretation of a sequence viV2 ■ ■ ■ G 'Zi* oi integers is the first 
integer in the sequence vi , with the convention that the interpretation 
of the empty sequence is 0. 

— Conversely, the interpretation of an integer value w G is the single- 
ton sequence v. 

— Addition, subtraction, equality, and less than are interpreted accord- 
ingly. 

The satisfaction relation is then defined in a standard fashion. 

Shorthands. We introduce several shorthands to simplify the writing of com- 
plex formulas. 

• A symbol for every constant k E X, defined as obvious. 

• a ^ /3, a < /3, a > l3, and a > (5 defined respectively as ^{a — /3), 
a < /3 V a = /3, ^{a < jS), and a > /3 A a ^ /3. 

• Shorthands such as a < /? < 7 or /3 G [a, 7) for a < /3 A /3 < 7. 

• Bounded length predicates such as < fc for a variable x and a constant 
k G I1 abbreviating K^'^{x) with R^''' a regular constraint interpreted as 
{e} U Uo<i</c "^^"^ definition of derived expressions such as ki < \x\ < 
^2 is also as obvious. 

• Subsequence functions such as x{ki, ^2) for a variabl e x and two constants 
fci, fc2 G !Z with the intended semantics (see Section 2.1|. We define these 
functions in the theory 7^eq(z) by the following rewriting rules, defined on 
formulas in prenex normal form with quantifier prefix Q: 



Q . 4,[x{ki,k2)] 



I K\ f\ X ^ uvw A |n| — fci — 1 A |v| — /C2 — ^1 + 1 \ 

V h:2 A a; ^ uvw f\\u\ — h\ — \ /\ \w\ — — ^2 

V K3 A a; ^ uvw A \v\ — —k\ + ^2 + 1 A \w\ — — ^2 

y V -^(/il V K2 V Ats) f\U^V^W^t J 



where: 



Ki = 1 < ^1 < ^2 < \^\ 

Ki = fci — |a;| < /c2 < 1 < fci 

K3 = 1 - |x| < fci < /C2 < 1 



• fst(a;) and lst(a::) for the first 1) and last element a;(0, 0) of a;, respec- 
tively. 



The presentation of the semantics of the theory is informal and implicit for brevity. 
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3.1.2 Examples 

A few examples demonstrate the expressiveness of the universal fragment of 
T^eqCz) to specify properties of sequences. 

1. Equality: sequences u and v are equal. 

u = v (1) 

2. Bounded equality: sequences u and v are equal in the constant interval 
[l,u\ for l,u€Z. 

uil,u)=v{l,u) (2) 

3. Boundedness: no element in sequence u is greater than value v. 

\/h,t • u = ht ^ t <v (3) 

4. Sortedness: sequence u is sorted (strictly increasing). 

yh,m,t • u = hmt A |m| = 1 A |i| > m < i (4) 

5. Injectivity: u has no repeated elements. 

V/i, vi,m, V2,t • u = hvimv^t A = 1 A \v2\ = 1 vi ^ V2 (5) 

6. Partitioning: sequence u is partitioned at constant position k > 0. 

( u{l,k) = hiti \ 

V/li,ti,/l2,t2 • Au(fc + 1,0) = M2 \^ti<t2 (6) 

V A >0A|i2| >0 / 

7. Membership: constant element k Gli occurs in sequence u. 

u e (z*fcz*) (7) 

8. Non-membership: no element in sequence u has value v. 

\/h,t • u = ht A\t\ > Q ^ t (8) 

9. Periodicity: in non-empty sequence u, elements on even positions have 
value and elements on odd positions have value 1 (notice that \st{h) = 
if h is empty). 

VM.u = .,AW>0*(':^^)r;)A('!^t-j°) (9) 

10. Comparison between indices and values: for every index i, element at 
position i in the non-empty sequence u has value i + 3. 

u = l + 3AV/i,t,u • u = htA\h\ > OA|i| > OAlst(/i) = v^t = v + 1 (10) 

11. Disjunction of value constraints: for every pair of positions i < j m. 

the sequence u, either u(i,i) < u(j,j) or u(i,i) > 2u(j,j). 

yh,vi,m,V2,t • u = hvimv2tA\vi\ > 0A|t;2| > => vi < V2\/vi > V2+V2 

(11) 
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Comparison with theories of arrays. Properties such as strict sortedness 
Q, periodicity and comparisons between indices and values (10 1 are inex- 



pressible in the array logic of Bradley et al. [8j . The latter is inexpressible also 
in the logic of Ghilardi et al. 16J because Presburger arithmetic is restricted to 



indices. Properties such as (11) are inexpressible both in the SIL array logic 
of |21) — because quantification on multiple array indices is disallowed — and 
in the related LIA logic of — because disjunctions of comparisons of array 
elements are disallowed. Extensions of each of these logics to accommodate the 
required features would be undecidable. 

Conversely, properties such as permutation, bounded equality for an interval 
specified by indices, length constraints for a variable value, membership for a 
variable value, and the subsequence relation, are inexpressible in the universal 
fragment of 7^eq(z)- Notice that membership and the subsequence relation are 
expressible in the dual existential fragment of 7ieq(z)j while the other properties 
seem to entail undecidability of the corresponding 7^eq(z) fragment (see Section 



3.3). Bounded equality, length constraints, and membership, on the other hand, 
are expressible in all the logics of [HI [lEl 1211 122] j and [TO] outlines a decidable 
extension which supports the subsequence relation (see Section [5| . 

3.2 Deciding Properties of Integer Sequences 

This section presents a decision procedure I?seq(z) for the universal fragment of 
7^eq(z)- The procedure transforms any universal 7^eq(z) formula into an equi- 
satisfiable universal formula in the theory of concatenation over the free monoid 
{a, b, c, d}* . The basic idea is to encode integers as sequences over the four sym- 
bols {a, b, c, d}: the sequence acb'^^a encodes a nonnegative integer ki, while the 
sequence adb~^'^a encodes a negative integer k2- Suitable rewrite rules encode all 
quantifier-free Presburger arithmetic in accordance with this convention. The 
next subsection 3.2.1 outlines the decision procedure I'seq(z), while subsection 
|3. 2. 2| illustrates its correctness and discusses its complexity. 

3.2.1 2?5eq(z): A Decision Procedure for 7^eq(z) 
Consider a universal formula of 7^eq(z) in prenex normal form: 

Vxi, ...,x^ • -ip (12) 



where i/; is quantifier-free. Modify (12) by application of the following steps. 

1. Introduce fresh variables to normalize formulas into the following form: 

fmla ::= var = var \ var = var o var \ R{var) \ var = | var = 1 

I var — var \ var — var + var \ var — var — var \ var < var 
I ^fmla I fmla V fmla \ fmla A fmla \ fmla =^ fmla 

Clearly, we can achieve this by applying exhaustively rewrite rules that 
operate on ip such as: 

ip[xoy] ip[x + y] 



e = xoy^tp[e] f^x + y^ip[f] 
for fresh variables e, /. 
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2. For each variable Xi € FV{i}i) — {xi, . . . , Xy}, introduce the fresh variables 
hi,ti, Si,mi (for head, tail, sign, modulus) and rewrite ip as: 



A 



/ / Xi^ hiU 

A hi = aSiTTiia 
A Si e {c, d} 
A TO; e b* 
\ \ A U e {acb*a U adb+a)* J 



A hi = aSiiTiia 

V A Si = c 

A mi = e 
V A = e / / 



3. Apply the following rule exhaustively to remove arithmetic equalities: 

■>p[xi = Xj] tplxi = 0] ip[xi = 1] 

4. Apply the following rule exhaustively to remove differences: 

ip[xk = a;j - Xj] 

1p[Xi = Xk + Xj] 

5. Apply the following rule exhaustively to remove comparisons: 

1p[Xi < Xj] 



y nii ^ nijp 



V m,- 



d A Sj = c 



Sj = c A rrij = rriip 



d Ami ^ nijp 



for fresh p E b^ . 
6. Apply the following rule exhaustively to remove sums: 

ljj[xk = Xi+ Xj] 



V nii = irijp 



V Too- 



ruip 



Si — Sj Axk ~ asiiriinija 
V 

Si 7^ Sj A JTii = vrLj A Xk = aca 
V 

Si ^ Sj Ami ^ mjp A Xk = asipa 
V 

Si ^ Sj A TOj = mip A Xk = asjpa 



for fresh p E b^ . 



7. Modify the meaning of regularity constraints as follows: let Ri be defined 
by a regular expression with constants in Z. Substitute every occurrence 
of a nonnegative constant A; G Z by acb'^a; every occurrence of a negative 
constant /c S Z by adb^'^a; every occurrence of set Z by acb*a U adb^a. 



The resulting formula is again in form (12 1 where -0 is now a quantifier-free 
formula in the theory of concatenation over {a, 6, c, d}*; its validity is decidable 
by Proposition [T] 
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3.2.2 Correctness and Complexity 

Let us sketch the correctness argument for the decision procedure I?seq(z), which 
shows that the transformed formula is equi-satisfiable with the original one. 

The justification for step 1 is straightforward. After applying it a series of 
substitutions eliminates arithmetic by reducing it to equations over the theory 
of concatenation with the unary encoding of integers defined above. 

Step 2 requires that any variable Xi is a sequence of the form {acb* a\Jadb^ a)* 
and introduces fresh variables to denote significant parts of the sequence: hi 
aliases the first element of the sequence which is further split into its sign Si 
(c for nonnegative and d for negative) and its absolute value encoded as a 
unary string in 6*. The second term of the disjunction deals with the case of Xi 
being e, which has the same encoding as 0. 

The following steps replace elements of the signature of Presburger arith- 
metic by rewriting them as equations over sequences with the given encoding. 
Step 3 reduces the arithmetic equality of two sequences of integers to equiva- 
lence of the sequences encoding their first elements. Step 4 rewrites equations 
involving differences with equations involving sums. 

Step 5 reduces arithmetic comparisons of two sequences of integers to a case 
discussion over the sequences hi^hj encoding their first elements. Let p be a 
sequence in 6+ encoding the difference between the absolute values correspond- 
ing to hi and hj ; obviously such a p always exists unless the absolute values are 
equal. Then, hi encodes an integer strictly less than hj iff one of the following 
holds: (1) hi is a negative value and hj is a nonnegative one; (2) both hi and hj 
are a nonnegative value and the sequence of 6's in hj is longer than the sequence 
of 6's in hi; or (3) both hi and hj are a negative value and the sequence of 6's 
in hi is longer than the sequence of 6's in hj . 

Step 6 reduces the comparison between the value of a sum of two variables 
and a third variable to an analysis of the three sequences hi, hj, hk encoding the 
first elements of the three variables. As in step 6, the unary sequence p encodes 
the difference between the absolute values corresponding to hi and hj. Then, 
hk encodes the sum of the values encoded by hi and hj iff one of the following 
holds: (1) hi and hj have the same sign and hk contains a sequence of 6's which 
adds up the sequences of b's of hi and hj, still with the same sign; (2) hi and 
hj have opposite sign but same absolute value, so hk must encode 0; (3) hi and 
hj have opposite sign and the absolute value of hi is greater than the absolute 
value of hj, so hk has the same sign as hi and the difference of absolute values 
as its absolute value; or (4) hi and hj have opposite sign and the absolute value 
of hj is greater than the absolute value of hi, so hk has the same sign as hj and 
the difference of absolute values as its absolute value. 

Finally, step 7 details how to translate the interpretation of the regular 
constraints over Z into the corresponding regularity constraints over {a, b, c, d} 
with the given integer encoding. 

It is not difficult to see that all rewriting steps in the decision procedure 
2'seq(Z) increase the size of ip at most quadratically (this accounts for fresh 
variables as well). Hence, the PSPACE complexity of the universal fragment of 
the theory of concatenation (Proposition [ij carries over to I?seq(z)- 

Theorem 4. The universal fragment o/ 7^eq(z) decidable in PSPACE with 
the decision procedure 2?5eq(z) . 
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3.3 Undecidable Extensions 

Theorem 5. The following extensions of the universal fragment o/7^eq(Z) cf^e 
undecidable. 

1. The V*3* and 3*V* fragments. 

2. For any pair of integer constants ki,k2, the extension with the two length 
functions \x\k2 counting the number of occurrences of ki and k2 in 

X. 

3. The extension with an equal length predicate Elg{x,y) ^ |a;| — \y\. 

4. The extension with a sum function cf{x) = x{i,i). 

Proof. 1. Sentences with one quantifier alternation are undecidable already 
for the theory of concatenation without arithmetic and over a monoid 
of finite rank (Proposition [2| . Notice that the set of sentences that are 
expressible both in the V*3* and in the 3*V* fragment is decidable [101 
Th. 4.4]; however, this set lacks a simple syntactic characterization. 

2. Corollary of Proposition [2j 

3. We encode the universal theory of 11 = (E^, 0, 1, +, tt) — where tt^x, y) = 
x2y — in the universal fragment of 7^eq(z) extended by the Elg predicate; 
undecidability follows from the undecidability of the existential and uni- 
versal theories of 11 [TTJ Corollary 5] . All we have to do is showing that 
7r(x, y) — p is universally definable in 7^eq(z) with Elg. To this end, first 
define ly as a sequence that begins with value y, ends with value 1, and 
where every element is the successor of the element that follows. 

\/h,t • ^st{ly) = yA\st{ly) = I My = ht A\h\ > OA |f| > ^ \st{h) =t+l 

As a result ly is in the form y, ?/ — 1, . . . , 1 and hence has length yj^ Then, 
7r(x, y) is universally definable as the sequence p with the same length 
as ly, whose last element is cc, and where every element is obtained by 
doubling the value of the element that follows: 

yg,u • Elg{p, ly) A Ist(p) = x Ap = guA\g\ >OA|u| >0^ lst(5) = u + u 

Hence p has the form 2^a;, 2^~^x, . . . , 2^x, 2x, x which encodes the desired 
value x2^ in 7^eq(z)- (Notice that the two universal definitions of ly and 
p can be combined into a single universal definition by conjoining the 
definition of p to the consequent in the definition of ly). 

4. For any sequence x over {0,1} define Sp{x) — y as y £ 0*1* A a{y) ~ 
cr{x). Then, Proposition [2] implies undecidability because this extension 
of 7^eq(z) can define universal sentences over the free monoid {a, b}* with 
the function Sp. □ 

^This technique would allow the definition of the length function |a:| and full index arith- 
metic as well. 
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1 merge-sort (a: ARRAY): ARRAY 



2 local 

3 do 



l,r: ARRAY 



4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 



if 



I a| < 1 then 
{ sorted ( a) } 
Result :— a 
else 

I , r := a[l:\ a|/2] 
{ / ★ r — a } 
Z , r :— merge_sort (/) 
{ sorted(/) A sorted(r) } 
from Result :— e 

{ invariant sorted(R,esult) A sortcd(/) A sortcd(r) A 
Ist(Result) < f5t(i) A Ist(Result) < f5t(r) 



a[\ a\/2+l: \a\] 

mcrgc_sort (r) 



then 

\ first 



until I i| = V |r| = 
loop 

if I . first > r. first 
Result :— Result 
else 

Result :— Result it Lfirst ; I :— ^ . rest 
end 
end 

if I i I > then 

{ |r| = } Result 
else 

{ \ l\ = } Result 
end 



Result * I 



Result * 



1 reverse (a: LIST): LIST 

2 local v: INTEGER ; s: STACK 

3 do 

4 from s :— e 

5 { invariant s o a — old a } 

6 until a — e 

7 loop 

8 s . push ( a. first ) 

9 a :— a. rest 

10 end 

11 from Result :— e 

12 { invariant 

13 so Result"* = old a } 

14 until s — e 

15 loop 

16 V :— s.top 

17 s.pop ; Result. extend {v) 

18 end 

19 { ensure Result — old a} 



28 { ensure sorted (Result) } 



Table 1: Annotated Mergesort (left) and Array Reversal (right). 



The decidability of the following is instead currently unknown: the extension 
of the universal fragment with a function x ® 1 defined as the sequence a;(l) + 
1, a;(2) + l, . . . , a;(|a:|) + l. The fragment allows the definition of the set S'={0"1" | 
n G IN} as the sequences x such that a; G 0*1* A Vu, v • a; = wuAweO*AwG 
1* M © 1 = w. This is inexpressible in the universal fragment of the theory 
of concatenation, but the decidability of the resulting fragment is currently 
unknown (see Proposition pi) . 



4 Verifying Sequence-Manipulating Programs 

This section outlines a couple of examples that demonstrate using formulas in 
the theory 7^eq(z) to reason about sequence-manipulating programs. An imple- 
mentation of the decision procedure 2?seq(z) is needed to tackle more extensive 
examples; it is currently underway. The examples are in Eiffel-like pseudo-code 
[36 ]: it is not difficult to detail an axiomatic semantics and a backward substi- 
tution calculus, using the universal fragment of 7^eq(Z)i for the portions of this 
language used in the examples. 

Reversal. In Table [I] (right), a program reverses a sequence of integers, given 
as a list a, using a stack s. The query "first" returns the first element in a 
list, and the command "extend" adds an element to the right of a list; the 
query "top" and the commands "pop" and "push" for a stack have the usual 
semantics. In the annotations, s is modeled by a sequence whose first element 
is the bottom of the stack, whereas the expression old a denotes the value of a 
upon entering the routine. 
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The superscript denotes the reversal of a sequence. We do not know if the 
extension of 7^eq(z) by a reversal function is decidable. However, the following 
two simple update axioms are sufficient to handle any program which builds 
the reverse u*^ of a sequence u starting from an empty sequence and adding one 
element at a time: 

u"^ = e <^ u = e = 1 ^ (ux)^ = xu^ 

Consider, for instance, the verification condition that checks if the invariant 
of the second loop (lines 11-18) is indeed inductive: 

s o Result'' = old a A s 7^ e s(l, -1) o (Result o s(0, 0))'^ = old a 

After rewriting (Result o s(0, 0))'^ into s(0,0) o Result'^ the implication is 
straightforward to check for validity. The rest of the program is also simple 
to check with standard backward reasoning techniques. 

Mergesort. Consider a straightforward recursive implementation of the Merge- 
sort algorithm; Table [l] (left) shows an annotated version, where ★ denotes the 
concatenation operator in the programming language (whose semantics is cap- 
tured by the corresponding logic operator o). The annotations specify that the 
routine produces a sorted array, where predicate sorted(u) is defined as: 

sorted(u) = V/i, m, t • u = hmt A |m| > 1 A |t| > m < t 

It is impossible to express in 7^eq(Z) another component of the full functional 
specification: the output is a permutation of the input. This condition is inex- 
pressible in most of the expressive decidable extensions of the theory of arrays 
that are currently known, such as [3 [21] (see also Section [5|. Complementary 
automated verification techniques — using different abstractions such as the 
multiset [37] — can, however, verify this orthogonal aspect. 

We must also abstract away the precise splitting of array a into two halves 
in line 8. The way in which a is partitioned into I and r is however irrelevant 
as far as correctness if concerned (it only influences the complexity of the al- 
gorithm), hence we can simply over-approximate the instruction on line 8 by a 
nonderministic splitting in two continuous non-empty parts. 

From the annotated program, we can generate verification conditions by 
standard backward reasoning. Universal sentences of 7^eq(z) can express the 
verification conditions, hence the verification process can be automated. Let us 
see an example on the non-trivial part of the process, namely checking that the 
formula on lines 13-14 is indeed an inductive invariant. Consider the "then" 
branch on line 18. Backward substitution of the invariant yields: 

sorted(Result * fst(r)) A sorted(/) A sorted(r(2, 0)) A 
lst(Result * fst(r)) < fst(0 A lst(Result * fst(r)) < fst(r(2, 0)) (13) 

This condition must be discharged by the corresponding loop invariant hy- 
pothesis: 

fst(0 > fst(r) A sorted(Result) A sorted(0 A sortcd(r) A (14) 
Ist(Result) < fst(0 A Ist(Result) < fst(r) A \l\ ^ A \r\ ^ 
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Checking that (14) entails (13) discharges the corresponding verification con- 
dition. Elements of this condition can be encoded in the universal fragment 
of 7Ieq(z) and proven using the decision procedure of Section 3.2 for instance, 
the fact that Ist(Result) < fst(0, \l\ ^ 0, |r| ^ 0, and fst(/]> fst(r) imply 
lst(Result * fst(r)) < fst(/) corresponds to the validity of (all free variables are 
implicitly universally quantified): 



/ r = hrTTlrtr A \hr\ = 1 A |r|7^0 \ 

A I = hiniiti A \hi\ = 1 A \l\ ^ 
A Result o hr = hmt /\\t\ = 1 
\ A hi > hr j 



t < h. 



5 Related Work 

Pioneering efforts on automated program verification focused on very simple 
data types — in most cases just scalar variables — as the inherent difficul- 
ties were already egregious. As verification techniques progressed and matured, 
more complex data types were considered, such as lists (usually a la Lisp), ar- 
rays, maps, and pointers, up to complex dynamic data structures. Arrays in 
particular received a lot of attention, both for historical reasons (programming 
languages have been offering them natively for decades) , and because they often 
serve as the basis for implementing more complex data structures. More gener- 
ally, a renewed interest in developing decision procedures for new theories and 
in integrating existing ones has blossomed over the last few years. A review of 
this staggering amount of work is beyond the scope of this paper; for a partial 
account and further references we refer the reader to e.g., [43l|30] (and [24l [28] 
for applications). In this section, we review approaches that are most similar 
to ours and in particular which yield decidable logics that can be compared 



directly to our theory of sequences (see Section 3.1.2). This is the case with 



several of the works on the theory of arrays and extensions thereof. 



The theory of arrays. McCarthy initiated the research on formal reasoning 
about arrays ,34j . His theory of arrays defines the axiomatization of the ba- 
sic access operations of read and write for quantifier-free formulas and without 
arithmetic or extensionality (i.e., the property that if all elements of two arrays 
are equal then the arrays themselves are equal). McCarthy's work has usually 
been the kernel of every theory of arrays: most works on (automated) reasoning 
about arrays extend McCarthy's theory with more complex (decidable) proper- 
ties or efficiently automate reasoning within an existing theory. 

Thus, a series of work extended the theory of arrays with arithmetic |321[5S] 
and with sorting predicates on array segments 1331 ; automated reasoning within 
these theories is possible only for restricted classes of programs. Extensionality 
is another very significant extension to the theory of arrays 41! , which has now 
become standard as it is decidable. 

The fast technological advances in automated theorem proving over the last 
years have paved the way for efficient implementations of the theory of arrays 
(usually with extensionality) . These implementations use a variety of techniques 
such as SMT solving [3l [HI [3 [HI [T8| , saturation theorem proving |31l [2] , and 
abstraction [9l I26|, I27j . Automated invariant inference is an important appli- 
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cation of these decision procedures, which originated a speciaUzed Hne of work 

Decidable extensions of the theory of arrays. The last few years have 
seen an acceleration in the development of decidable extensions of the exten- 
sional theory of arrays with more expressive predicates and functions. 

Bradley et al. [5] develop the array property fragment, a decidable subset 
of the 3*V* fragment of the theory of arrays. An array property is a formula 
of the form 3*V* • l ^ v, where the universal quantification is restricted to 
index variables, t is a guard on index variables with arithmetic (restricted to 
existentially quantified variables), and u is a, constraint on array values without 
arithmetic or nested reads, and where no universally quantified index variable 
is used to select an element that is written to. The array property fragment 
is decidable with a decision procedure that eliminates universal quantification 
on index variables by reducing it to conjunctions on a suitable finite set of 
index values. Extensions of the array property fragment that relax any of the 
restrictions on the form of array properties are undecidable. Bradley et al. also 
show how to adapt their theory of arrays to reason about maps. 

Ghilardi et al. [TB] develop "semantic" techniques to integrate decision pro- 
cedures into a decidable extension of the theory of arrays. Their ADV the- 
ory merges the quantifier-free extensional theory of arrays with dimension and 
Presburger arithmetic over indices into a decidable logic. Two extensions of the 
A'DV theory are still decidable: one with a unary predicate that determines if 
an array is injective (i.e., it has no repeated elements); and one with a function 
that returns the domain of an array (i.e., the set of indices that correspond 
to definite values). Ghilardi et al. suggest that these extensions might be the 
basis for automated reasoning on Separation Logic models. The framework of 
[16j also supports other decidable extensions, such as the prefix, and sorting 
predicates, as well as the map combinator also discussed in [12 . 

De Moura and Bj0rner |12) introduce combinatory array logic, a decidable 
extension of the quantifier-free extensional theory of arrays with the map and 
constant-value combinators (i.e., array functors). The constant-value combina- 
tor defines an array with all values equal to a constant; the map combinator 
applies a fc-ary function to the elements at position i in k arrays ai, . . . , a^. 
De Moura and Bj0rner define a decision procedure for their combinatory array 
logic, which is implemented in the Z3 SMT solver. 

Habermehl et al. introduce powerful logics to reason about arrays with in- 
teger values [221 EU E] ; unlike most related work, the decidability of their logic 
relies on automata-theoretic techniques for a special class of counter automata. 
More precisely, [22] defines the Logic of Integer Arrays LI A, whose formulas are 
in the 3*V* fragment and allow Presburger arithmetic on existentially quantified 
variables, difference and modulo constraints on index variables, and difference 
constraints on array values. Forbidding disjunctions of difference constraints 
on array values is necessary to ensure decidability. The resulting fragment is 
quite expressive, and in particular it includes practically useful formulas that arc 
inexpressible in other decidable expressive fragments such as [8 . The compan- 
ion work 121 J introduces the Single Index Logic SIL, consisting of existentially 
quantified Boolean combinations of formulas of the form V* • t => z^, where the 
universal quantification is restricted to index variables, i is a positive Boolean 
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combination of bound and modulo constraints on index variables, and ^' is a 
conjunction of difference constraints on array values. Again, the restrictions 
on quantifier alternations and Boolean combinations are tight in that relaxing 
one of them leads to undecidability. The expressiveness of SIL is very close 
to that of LIA, and the two logics can be considered two variants of the same 
basic kernel. The other work [5] shows how to use SIL to annotate and reason 
automatically about array-manipulating programs; the tight correspondence be- 
tween SIL and a class of counter automata allows the automatic generation of 
loop invariants and hence the automation of the full verification process. 

Other approaches. Static analysis and abstract interpretation techniques 
have also been successfully applied to the analysis of array operations, especially 
with the goal of inferring invariants automatically (e.g., [121 dOl 123] ) • 



6 Future Work 

Future work will investigate the decidability of the universal fragment of 7^eq(z) 
extended with "weak" predicates or functions that slightly increase its expres- 



siveness (such as that outlined at the end of Section 3.3). We will study to 
what extent the decision procedure for the universal fragment of 7^eq(z) can 
be integrated with other decidable logic fragments (and possibly with the dual 
existential fragment). We will investigate how to automate the generation of 
inductive invariants for sequence-manipulating programs by leveraging the de- 
cidability of the universal fragment of 7^eq(z)- Finally, we will implement the 
decision procedure, integrate it within a verification environment, and assess its 
empirical effectiveness on real programs. 
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